AITopics | Columbia

Collaborating Authors

Columbia

Denoising distances beyond the volumetric barrier

Huang, Han, Jiradilok, Pakawut, Mossel, Elchanan

arXiv.org Machine LearningApr-2-2026

We study the problem of reconstructing the latent geometry of a $d$-dimensional Riemannian manifold from a random geometric graph. While recent works have made significant progress in manifold recovery from random geometric graphs, and more generally from noisy distances, the precision of pairwise distance estimation has been fundamentally constrained by the volumetric barrier, namely the natural sample-spacing scale $n^{-1/d}$ coming from the fact that a generic point of the manifold typically lies at distance of order $n^{-1/d}$ from the nearest sampled point. In this paper, we introduce a novel approach, Orthogonal Ring Distance Estimation Routine (ORDER), which achieves a pointwise distance estimation precision of order $n^{-2/(d+5)}$ up to polylogarithmic factors in $n$ in polynomial time. This strictly beats the volumetric barrier for dimensions $d > 5$. As a consequence of obtaining pointwise precision better than $n^{-1/d}$, we prove that the Gromov--Wasserstein distance between the reconstructed metric measure space and the true latent manifold is of order $n^{-1/d}$. This matches the Wasserstein convergence rate of empirical measures, demonstrating that our reconstructed graph metric is asymptotically as good as having access to the full pairwise distance matrix of the sampled points. Our results are proven in a very general setting which includes general models of noisy pairwise distances, sparse random geometric graphs, and unknown connection probability functions.

artificial intelligence, machine learning, pakawutjiradilok, (15 more...)

arXiv.org Machine Learning

2604.00432

Country:

North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > Missouri > Boone County > Columbia (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

38c1b63d2369f64dbc01968aa1bd24fc-Paper-Conference.pdf

Neural Information Processing SystemsFeb-19-2026, 09:45:51 GMT

artificial intelligence, enull, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Missouri > Boone County > Columbia (0.13)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.13)
Europe > Switzerland > Basel-City > Basel (0.04)
(2 more...)

Genre:

Research Report (1.00)
Workflow (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.67)

Add feedback

Continuous Partitioning for Graph-Based Semi-Supervised Learning

Neural Information Processing SystemsFeb-7-2026, 16:26:39 GMT

This thresholding can further exacerbate the aforementioned degeneracy.

artificial intelligence, laplace, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Missouri > Boone County > Columbia (0.14)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > San Diego County > La Jolla (0.04)
(5 more...)

Genre: Research Report > Experimental Study (0.93)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Modeling Spatio-temporal Extremes via Conditional Variational Autoencoders

Ma, Xiaoyu, Zhang, Likun, Wikle, Christopher K.

arXiv.org Machine LearningDec-9-2025

Extreme weather events are widely studied in fields such as agriculture, ecology, and meteorology. The spatio-temporal co-occurrence of extreme events can strengthen or weaken under changing climate conditions. In this paper, we propose a novel approach to model spatio-temporal extremes by integrating climate indices via a conditional variational autoencoder (cXVAE). A convolutional neural network (CNN) is embedded in the decoder to convolve climatological indices with the spatial dependence within the latent space, thereby allowing the decoder to be dependent on the climate variables. There are three main contributions here. First, we demonstrate through extensive simulations that the proposed conditional XVAE accurately emulates spatial fields and recovers spatially and temporally varying extremal dependence with very low computational cost post training. Second, we provide a simple, scalable approach to detecting condition-driven shifts and whether the dependence structure is invariant to the conditioning variable. Third, when dependence is found to be condition-sensitive, the conditional XVAE supports counterfactual experiments allowing intervention on the climate covariate and propagating the associated change through the learned decoder to quantify differences in joint tail risk, co-occurrence ranges, and return metrics. To demonstrate the practical utility and performance of the model in real-world scenarios, we apply our method to analyze the monthly maximum Fire Weather Index (FWI) over eastern Australia from 2014 to 2024 conditioned on the El Niño/Southern Oscillation (ENSO) index.

dependence, extremal dependence, spatial extreme, (16 more...)

arXiv.org Machine Learning

2512.06348

Country:

North America > United States > Missouri > Boone County > Columbia (0.14)
Oceania > Australia > New South Wales (0.04)
Oceania > Australia > Queensland (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Health App Reviews for Privacy & Trust (HARPT): A Corpus for Analyzing Patient Privacy Concerns, Trust in Providers and Trust in Applications

Kelly, Timoteo, Korkmaz, Abdulkadir, Mallet, Samuel, Souders, Connor, Aliakbarpour, Sadra, Rao, Praveen

arXiv.org Artificial IntelligenceNov-25-2025

Background: User reviews of Telehealth and Patient Portal mobile applications (apps) hereon referred to as electronic health (eHealth) apps are a rich source of unsolicited patient feedback, revealing critical insights into patient perceptions. However, the lack of large-scale, annotated datasets specific to privacy and trust has limited the ability of researchers to systematically analyze these concerns using natural language processing (NLP) techniques. Objective: This study aims to develop and benchmark Health App Reviews for Privacy & Trust (HARPT), a large-scale annotated corpus of patient reviews from eHealth apps to advance research in patient privacy and trust. Methods: We employed a multistage data construction strategy. This integrated keyword-based filtering, iterative manual labeling with review, targeted data augmentation, and weak supervision using transformer-based classifiers. A curated subset of 7,000 reviews was manually annotated to support machine learning model development and evaluation. The resulting dataset was used to benchmark a broad range of models. Results: The HARPT corpus comprises 480,000 patient reviews annotated across seven categories capturing critical aspects of trust in the application (TA), trust in the provider (TP), and privacy concerns (PC). We provide comprehensive benchmark performance for a range of machine learning models on the manually annotated subset, establishing a baseline for future research. Conclusions: The HARPT corpus is a significant resource for advancing the study of privacy and trust in the eHealth domain. By providing a large-scale, annotated dataset and initial benchmarks, this work supports reproducible research in usable privacy and trust within health informatics. HARPT is released under an open resource license.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2506.19268

Country: North America > United States > Missouri > Boone County > Columbia (0.15)

Genre: Research Report (0.65)

Industry:

Information Technology (1.00)
Health & Medicine > Health Care Technology > Telehealth (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Active Learning and Explainable AI for Multi-Objective Optimization of Spin Coated Polymers

Young, Brendan, Alvey, Brendan, Werbrouck, Andreas, Murphy, Will, Keller, James, Young, Matthias J., Maschmann, Matthew

arXiv.org Artificial IntelligenceNov-17-2025

Spin coating polymer thin films to achieve specific mechanical properties is inherently a multi-objective optimization problem. We present a framework that integrates an active Pareto front learning algorithm (PyePAL) with visualization and explainable AI techniques to optimize processing parameters. PyePAL uses Gaussian process models to predict objective values (hardness and elasticity) from the design variables (spin speed, dilution, and polymer mixture), guiding the adaptive selection of samples toward promising regions of the design space. To enable interpretable insights into the high-dimensional design space, we utilize UMAP (Uniform Manifold Approximation and Projection) for two-dimensional visualization of the Pareto front exploration. Additionally, we incorporate fuzzy linguistic summaries, which translate the learned relationships between process parameters and performance objectives into linguistic statements, thus enhancing the explainability and understanding of the optimization results. Experimental results demonstrate that our method efficiently identifies promising polymer designs, while the visual and linguistic explanations facilitate expert-driven analysis and knowledge discovery.

data mining, machine learning, spin speed, (19 more...)

arXiv.org Artificial Intelligence

2509.08988

Country: North America > United States > Missouri > Boone County > Columbia (0.15)

Genre: Research Report > New Finding (0.48)

Industry:

Energy (0.69)
Materials (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

BenCao: An Instruction-Tuned Large Language Model for Traditional Chinese Medicine

Xie, Jiacheng, Yu, Yang, Chen, Yibo, Zhang, Hanyao, Zhao, Lening, He, Jiaxuan, Jiang, Lei, Tang, Xiaoting, An, Guanghui, Xu, Dong

arXiv.org Artificial IntelligenceOct-21-2025

Traditional Chinese Medicine (TCM), with a history spanning over two millennia, plays a role in global healthcare. However, applying large language models (LLMs) to TCM remains challenging due to its reliance on holistic reasoning, implicit logic, and multimodal diagnostic cues. Existing TCM-domain LLMs have made progress in text-based understanding but lack multimodal integration, interpretability, and clinical applicability. To address these limitations, we developed BenCao, a ChatGPT-based multimodal assistant for TCM, integrating structured knowledge bases, diagnostic data, and expert feedback refinement. BenCao was trained through natural language instruction tuning rather than parameter retraining, aligning with expert-level reasoning and ethical norms specific to TCM. The system incorporates a comprehensive knowledge base of over 1,000 classical and modern texts, a scenario-based instruction framework for diverse interactions, a chain-of-thought simulation mechanism for interpretable reasoning, and a feedback refinement process involving licensed TCM practitioners. BenCao connects to external APIs for tongue-image classification and multimodal database retrieval, enabling dynamic access to diagnostic resources. In evaluations across single-choice question benchmarks and multimodal classification tasks, BenCao achieved superior accuracy to general-domain and TCM-domain models, particularly in diagnostics, herb recognition, and constitution classification. The model was deployed as an interactive application on the OpenAI GPTs Store, accessed by nearly 1,000 users globally as of October 2025. This study demonstrates the feasibility of developing a TCM-domain LLM through natural language-based instruction tuning and multimodal integration, offering a practical framework for aligning generative AI with traditional medical reasoning and a scalable pathway for real-world deployment.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2510.17415

Country: North America > United States > Missouri > Boone County > Columbia (0.15)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Consumer Health (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.68)

Add feedback

Leveraging Group Relative Policy Optimization to Advance Large Language Models in Traditional Chinese Medicine

Xie, Jiacheng, Zeng, Shuai, Yu, Yang, Tang, Xiaoting, An, Guanghui, Xu, Dong

arXiv.org Artificial IntelligenceOct-21-2025

Traditional Chinese Medicine (TCM) presents a rich and structurally unique knowledge system that challenges conventional applications of large language models (LLMs). Although previous TCM - specific LLMs have shown progress through supervised fine - tuning, they often face limitations in alignment, data quality, and evaluation consistency. In this study, we introduce Ladder - base, the first TCM - focused LLM trained with Group Relative Policy Optimization (GRPO), a reinforcement learning method that improves reasoning and factual consistency by optimizing response selection based on intra - group comparisons. Ladder - base is built upon the Qwen2.5 - 7B - Instruct foundation model and trained exclusively on the textual subset of the TCM - Ladder benchmark, using 80 percent of the data for training and the remaining 20 percent split evenly between validation and test sets. Through standardized evaluation, Ladder - base demonstrates superior performance across multiple reasoning metrics when compared to both state - of - the - art general - purpose LLMs such as GPT - 4, Gemini 2.5, Claude 3, and Qwen3 and domain - specific TCM models including BenTsao, HuatuoGPT2, and Zhongjing. These findings suggest that GRPO provides an effective and efficient strategy for aligning LLMs with expert - level reasoning in traditional medical domains and supports the development of trustworthy and clinically grounded TCM artificial intelligence systems.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.17402

Country: North America > United States > Missouri > Boone County > Columbia (0.15)

Genre: Research Report > New Finding (0.69)

Industry: Health & Medicine > Diagnostic Medicine (0.96)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AgenticAD: A Specialized Multiagent System Framework for Holistic Alzheimer Disease Management

Bazgir, Adib, Habibdoust, Amir, Song, Xing, Zhang, Yuwen

arXiv.org Artificial IntelligenceOct-13-2025

Alzheimer's disease (AD) presents a complex, multifaceted challenge to patients, caregivers, and the healthcare system, necessitating integrated and dynamic support solutions. While artificial intelligence (AI) offers promising avenues for intervention, current applications are often siloed, addressing singular aspects of the disease such as diagnostics or caregiver support without systemic integration. This paper proposes a novel methodological framework for a comprehensive, multi-agent system (MAS) designed for holistic Alzheimer's disease management. The objective is to detail the architecture of a collaborative ecosystem of specialized AI agents, each engineered to address a distinct challenge in the AD care continuum, from caregiver support and multimodal data analysis to automated research and clinical data interpretation. The proposed framework is composed of eight specialized, interoperable agents. These agents are categorized by function: (1) Caregiver and Patient Support, (2) Data Analysis and Research, and (3) Advanced Multimodal Workflows. The methodology details the technical architecture of each agent, leveraging a suite of advanced technologies including large language models (LLMs) such as GPT-4o and Gemini, multi-agent orchestration frameworks, Retrieval-Augmented Generation (RAG) for evidence-grounded responses, and specialized tools for web scraping, multimodal data processing, and in-memory database querying. This paper presents a detailed architectural blueprint for an integrated AI ecosystem for AD care. By moving beyond single-purpose tools to a collaborative, multi-agent paradigm, this framework establishes a foundation for developing more adaptive, personalized, and proactive solutions. This methodological approach aims to pave the way for future systems capable of synthesizing diverse data streams to improve patient outcomes and reduce caregiver burden.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.08578

Country: North America > United States > Missouri > Boone County > Columbia (0.15)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Filters

Collaborating Authors

Columbia

Denoising distances beyond the volumetric barrier

38c1b63d2369f64dbc01968aa1bd24fc-Paper-Conference.pdf

Directional Pruningof Deep Neural Networks

Continuous Partitioning for Graph-Based Semi-Supervised Learning

Modeling Spatio-temporal Extremes via Conditional Variational Autoencoders

Health App Reviews for Privacy & Trust (HARPT): A Corpus for Analyzing Patient Privacy Concerns, Trust in Providers and Trust in Applications

Active Learning and Explainable AI for Multi-Objective Optimization of Spin Coated Polymers

BenCao: An Instruction-Tuned Large Language Model for Traditional Chinese Medicine

Leveraging Group Relative Policy Optimization to Advance Large Language Models in Traditional Chinese Medicine

AgenticAD: A Specialized Multiagent System Framework for Holistic Alzheimer Disease Management